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Abstract 

Background: Extensive studies on heterosis in plants using transcriptome analysis have identified differentially 
expressed genes (DEGs) in F] hybrids. However, it is not clear why yield in heterozygotes is superior to that of the 
homozygous parents or how DEGs are produced. Global allele-specific expression analysis in hybrid rice has the potential 
to answer these questions. 

Results: We report a genome-wide allele-specific expression analysis using RNA-sequencing technology of 3,637-3,824 
genes from three rice Ft hybrids. Of the expressed genes, 3.7% exhibited an unexpected type of monoallelic expression 
and 23.8% showed preferential allelic expression that was genotype-dependent in reciprocal crosses. Those 
genes exhibiting allele-specific expression comprised 42.4% of the genes differentially expressed between Ft hybrids 
and their parents. Allele-specific expression accounted for 79.8% of the genes displaying more than a 10-fold 
expression level difference between an Ft and its parents, and almost all (97.3%) of the genes expressed in F 1# 
but non-expressed in one parent. Significant allelic complementary effects were detected in the Ft hybrids of 
rice. 

Conclusions: Analysis of the allelic expression profiles of genes at the critical stage for highest biomass 
production from the leaves of three different rice F] hybrids identified genotype-dependent allele-specific 
expression genes. A cis-regulatory mechanism was identified that contributes to allele-specific expression, leading to 
differential gene expression and allelic complementary effects in F] hybrids. 

Keywords: Allele-specific expression, Complementary effects, Differentially expressed genes, Genotype-dependent 
monoallelic expression, Rice hybrids 



Background vigor. The "dominance" hypothesis proposes that the 
Heterosis, or hybrid vigor, refers to the superior bio- detrimental allele from one parent is complemented by 
logical functions of Fi hybrids compared with their par- the superior allele from the other parent, and that the ac- 
ental homozygous or inbred lines. This phenomenon cumulated superior alleles in the Fi hybrids give rise to 
was first described by Charles Darwin and was later inde- heterosis. By comparison, the "overdominance" hypothesis 
pendently rediscovered by George H. Shull and Edward signifies that hybrid vigor results from the interaction be- 
M. East in 1908 [1-3]. Although it is not well understood tween alleles brought together in the hybrid [4], 
at the molecular level, heterosis has been exploited To attempt to discriminate between these hypotheses, 
over the past half-century in plant and animal breed- extensive studies of gene effects and transcriptomics 
ing. Two classic hypotheses, "dominance" and "over- have been conducted [5-11]. Genetic analyses have re- 
dominance", have been proposed to explain hybrid vealed the genetic effects of additive, overdominance, 

dominance, and epistasis, and that interactions between 

— . - different loci are associated with heterosis in different 
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have indicated that DEGs commonly occur in inbred 
lines as well as between F x hybrids and their parents. 
For example, 4-18% of maize genes are differentially 
expressed in different tissues of the maize inbred lines 
B73 and Mo 17 according to microarray-based analyses 
[12]. In Arabidopsis seedlings, high-density single nu- 
cleotide polymorphism (SNP) analysis showed that 
31% of all analyzed genes were differentially expressed 
between the parental inbred lines [13], while, in an- 
other study, 10.6% of genes were differentially expressed 
in different tissues of the hybrid rice LYP9 and its 
parents [14]. 

Recently, high-throughput RNA-sequencing technol- 
ogy revealed that 4- week-old shoots of the 93-11 and 
Nipponbare rice varieties had 24.0% DEGs, as did their 
reciprocal hybrids [15]. Moreover, a newly published glo- 
bal survey based on RNA sequencing technology found 
that approximately 70% of all expressed genes were dif- 
ferentially expressed between the two maize inbred par- 
ents B73 and Mol7, and that 42-55% were differentially 
expressed between the reciprocal F x and its parents [16]. 
Although the molecular basis of heterosis has been 
attributed to the DEGs in the above hybrids, the 
underlying mechanism(s) causing differential expres- 
sion remain unknown. 

Several studies have shown that only one allele is 
expressed in heterozygotes [17-20], and monoallelic ex- 
pression or an imbalance in heterozygote allelic expres- 
sion has been extensively studied in humans and other 
mammals [21-23]. Transcription profile analyses have 
indicated that monoallelic expression could be caused by 
X-chromosome silencing, autosomal imprinting, or ran- 
dom events. Some studies with vegetative tissues from 
maize F 2 hybrids identified several genes exhibiting allele- 
specific expression (ASE) [12,24,25], which differed mark- 
edly between the different F 2 hybrids and was altered in 
response to environmental stress. This could contribute to 
heterosis. 

The objectives of the present study were to explore 
global ASE in hybrid rice and to reveal the mechanism 
of differential expression in F 2 hybrids using RNA se- 
quencing. Three elite rice varieties were chosen that met 
the breeding objectives from different periods in China, 
Guangluai #4 (GL, 1970- 1980s), Teqing (TQ, 1980- 
1990s), and 93-11 (1990s to present), plus their F x hy- 
brids, which we show have different levels of heterosis. 
Two Fi hybrids, GL x TQ and GL x 93-11, exhibited 
high heterosis, and the third, 93-11 x TQ, low heterosis. 
To obtain sufficient SNPs to distinguish the maternal 
and paternal alleles in F 2 hybrids, the genomes of the 
three parents were re-sequenced. To identify more SNPs 
for further ASE analysis, nuclear RNA was extracted 
from leaves of the three F 2 hybrids and their parents and 
subjected to Illumina RNA-Seq technology. We identify 



a global ASE profile that reveals a potential mechanism 
for an increased biomass-based, grain-yield heterosis. 

Results 

Global ASE analysis by RNA sequencing 

To obtain sufficient SNPs for ASE analysis, we achieved 
a rice genome coverage of 17.7-27.7 fold to satisfy the 
minimum requirement of obtaining more than 90% 
SNPs [26]. A total of 76.2-119.0 million reads (100 bp 
per read) from three rice varieties serving as parents for 
the F x s were obtained using Illumina DNA-Seq technol- 
ogy (Additional file 1: Table SI), and 375,744-411,571 
SNPs were detected between each parent with 98.0% ac- 
curacy (Additional file 2: Figure SI A, Additional file 3: 
Table S2). A total of 89.5-114.1 million reads (90 bp per 
read) were obtained from analogous tissue from the 
three F 2 hybrids. Analysis disclosed that 29,064-29,928 
genes at the secondary branch differentiation stage were 
expressed in the three F x hybrids and their parents 
(Additional file 1: Table SI). The expression levels of 30 
genes in the ¥ 1 hybrids and their parents were also de- 
termined, and quantitative real time polymerase chain 
reaction (qRT-PCR) and RNA-sequencing were shown 
to be highly correlated in terms of their analyses (expres- 
sion level correlation coefficient r = 0.92-0.99 (Additional 
file 4: Table S3)). We found that 41,416-43,685 of the 
SNPs located in gene bodies had an average read cover- 
age of 8.6-10.9 in the F 2 hybrids. These SNPs were avail- 
able for allelic expression analysis (Additional file 2: 
Figure SIB and Additional file 5: Table S4). Of these, 
9,752-10,818 genes accounting for 33.5-36.1% of the 
total number of expressed genes containing SNPs could 
be used for distinguishing the alleles (Additional file 6: 
Figure S2). A total of 3,627-3,824 genes met the criter- 
ion of at least 10 SNP reads, so could be used for further 
ASE profile analyses. 

Through these ASE profile analyses, the F 2 hybrids 
were classified into three categories: monoallelic expres- 
sion, in which only one allele from either the maternal 
or paternal parent is expressed; preferential allelic ex- 
pression, in which expression levels differ by more than 
two-fold between alleles; and biallelic expression, in 
which expression levels vary by less than two-fold be- 
tween alleles (Additional file 7: Figure S3). We found 
that 3.4-3.9% of the genes were monoallelic expression, 
23.5-24.2% were preferential allelic expression, and 
72.0-73.0% were biallelic expression (Figure 1A-C). Fur- 
ther analysis of paternally and maternally derived read 
coverage in each heterozygote showed that the two par- 
ents contributed equally to the F 2 hybrids. Our results 
are consistent with those from Arabidopsis embryos 
[27], and suggest that no obvious parent-of-origin effect 
occurred in the vegetative tissues of the rice hybrids 
(Figure 1A-C and Additional file 8: Figure S4). 
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■ Monoallelic ■ Monoallelic ■ Monoallelic ■ Monoallelic ■ Monoallelic ■ Monoallelic 
expression (GL) expression (TQ) expression (93-11) expression (GL) expression (93-11) expression (TQ) 

Preferential allelic ■ Preferential allelic Preferential allelic ■ Preferential allelic Preferential allelic ■ Preferential allelic 
expression (GL) expression (TQ) expression (93-11) expression (GL) expression (93-11) expression (TQ) 



Biallelic expression ■ Biallelic expression Biallelic expression 

GLxTQ GLx93-11 93-1 1xTQ 

Figure 1 Overall allele-specific expression (ASE) profiles in three F n populations. The three types of ASE genes and their proportions 
detected in GLxTQ (A), GLx 93-11 (B), and 93-11 xTQ (C). 



A cis-regulation mechanism and genotype-dependent 
monoallelic expression 

To understand the relationship between gene expression 
in the parents and ASE in the ¥ 1 hybrids, the correlation 
between these parameters was determined for 3,627- 
3,824 genes, yielding correlation coefficients in the range 
of 070-076 (Figure 2A, P<2.2E-16). A higher correl- 
ation coefficient was obtained from the same analysis with 
2,026 ASE genes in the F 2 hybrids (Figure 2A, Pearsons 
r>0.80, P<2.2E-16). These results suggest that a cis- 
regulatory mechanism is occurring, that is, if the allele is 
transcribed in the parent it is also transcribed in its F 2 
hybrids, and if not in the parent then not in its F 2 hybrids. 
Our data also indicate that only 5.4-19.3% of gene expres- 
sion in Fi hybrids is non-additive. Trans-acting regu- 
lation may thus also contribute to the regulation of gene 
expression in rice hybrids, but would not have a major 
effect. 

We identified 413 monoallelic expression genes in the 
three F 2 hybrids (143 from GL x TQ, 129 from GL x 93- 
11, and 141 from 93-11 x TQ) (Additional file 9: Table 
S5). Of these, 108 were common to two F x hybrids, but 
none were common to all three (Figure 2B). We ran- 
domly chose 134 monoallelic expression genes, account- 
ing for 32.4% of the total, from the three hybrids and 
used reverse-transcription (RT)-PCR sequencing to ver- 
ify a reliability of 91.8% (123/134) (Figure 2C, Additional 
file 10: Table S6). Through further investigation of allelic 
expression patterns in three reciprocal crosses, GL x 93- 
11 and 93-11 x GL, GL x TQ and TQ x GL, and 9311 x TQ 
and TQx 93-11, we detected 109 monoallelic expression 



genes and 14 preferential allelic expression genes and found 
that all monoallelic expression and preferential allelic ex- 
pression genes tested exhibited a genotype-dependent ex- 
pression pattern, while 17 biallelic expression genes showed 
no difference between reciprocal crosses (Figure 2C-F, 
Additional file 11: Table S7). This shows that, regard- 
less of the paternal or maternal origin in the reciprocal 
crossings, monoallelic expression and preferential al- 
lelic expression genes always express the allele from a 
given parent (Figure 2C and E). Combining our results 
with previous observations from maize [24], we sug- 
gest that a hitherto overlooked type of monoallelic ex- 
pression occurs in eukaryotic organisms. 

ASE results in transcriptome divergence in the Ft hybrids 

Previous studies have demonstrated that higher levels of 
heterosis are associated with greater differences between 
the agronomic and/or metabolic traits of parents [28-30], 
and that DEGs (fold change >2.0, false discovery rate 
(FDR) <0.05) between F 2 hybrids and their parents are 
among the major factors leading to heterosis [14,31,32]. 
To ascertain the extent to which monoallelic expression 
and preferential allelic expression genes contribute to het- 
erosis, we dissected the global DEGs between F 2 hybrids 
and their parents. The total number of DEGs in the three 
hybrids was correlated with heterosis level of both fresh 
and dry mass (r > 0.99, P < 0.03; Figure 3A-B). 

To explore which mechanism(s) create DEGs in het- 
erozygotes, we found that 95.1% of monoallelic expres- 
sion genes, 50.3% of preferential allelic expression genes, 
and 30.4% of biallelic expression genes are DEGs (Figure 3C). 
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Figure 2 The cis-regulatory mechanism and genotype-dependent monoallelic expression. (A) Correlation analysis between genie 
expression in the parental generation and allelic expression in the F-\ generation of the three hybrids. Red, allelic-specific expressed genes (ASE); 
blue, biallelic expressed genes. (B) The number of variety-specific and commonly expressed monoallelic expressed genes in the three F] hybrids. 
(C) Example of a monoallelic expression gene confirmed by RT-PCR sequencing of reciprocal Ft crosses. (D) Confirmation of genotype-dependent 
monoallelic expression patterns in the three F] hybrids showing the origin of the alleles. (E) Example of a preferential allelic expression gene 
confirmed by RT-PCR sequencing of reciprocal F] crosses. (F) Example of a biallelic expression gene confirmed by RT-PCR sequencing of reciprocal 
Fi crosses. 



Surprisingly, we discovered that the monoallelic expression 
and preferential allelic expression genes, comprising 27.5% 
of the total analyzed genes, amounted to 42.7% of the DEGs 
(Figure 3D). More specifically, the monoallelic expression 
genes, accounting for 3.7% of the total analyzed genes, gave 
rise to 10.0% of the DEGs, the preferential allelic expression 
genes (23.8%) amounted to 32.4%, and the biallelic expres- 
sion genes (72.3%) were composed of only 57.6% of the 
DEGs (Figure 3D). By determining contributions to DEGs 
between Fi hybrids and their parents, we found that 57.2% 
of DEGs from monoallelic expression, 11.4% from preferen- 
tial allelic expression, and 3.9% from biallelic expression ex- 
hibited a considerably greater than 10-fold difference. By 
contrast, 79.7% of DEGs from biallelic expression, 68.6% 
from preferential allelic expression, and 32.1% from monoal- 
lelic expression genes displayed a less than four-fold 



difference between the F 2 and their parents (Figure 3E). 
The 71.6-83.5% of the genes that were expressed in 
the F^ but silenced in one of the two parents showed 
monoallelic expression patterns (Figure 3F). These re- 
sults demonstrate that the majority of DEGs (>10 fold 
difference) are attributable to ASE, and that monoalle- 
lic expression genes in particular play an important 
role in gene expression divergence between ¥ 1 hybrids 
and their parents. 

Complementary effects are mainly contributed by ASE 
genes 

Because the presence of a cis-regulatory mechanism of 
allelic expression would be in accordance with the "dom- 
inance" hypothesis, we analyzed the transcriptomic pro- 
files of all ASE genes in the three F 2 hybrids and their 
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Figure 3 Contributions of the three allelic expression types of genes to the differentially expressed genes (DEGs). (A) The mid-parent 
heterosis level of total biomass at the secondary branch differentiation stage exhibited by the three Ft hybrid populations. Heterosis was evaluated 
using the fresh weight and dry weight of single plants from each F] hybrid generation and their parents. (B) The preferential allelic expression genes 
compared between each F] hybrid and its parents. (C) The percentages of DEGs represented by the genes of each allelic expression type. (D) The 
contributions of monoallelic expression, preferential allelic expression, and biallelic expression genes to the total genes and DEGs in the 
three F n s. (E) The proportions of genes with monoallelic expression, preferential allelic expression, and biallelic expression profiles in the F] 
generation compared with their parents, sub-grouped according to fold differences in expression level. (F) The proportions of monoallelic 
expression, preferential allelic expression, and biallelic expression genes expressed in the Ft generation and one of the two parents but not 
expressed in the other parent. 



parents. A large number of genotype-dependent monoal- 
lelic expression genes in F x hybrids would also be in ac- 
cordance with the "dominance" hypothesis. The results 
showed that an average of 51.6% (53.1% in GL x TQ, 
55.0% in GL x 93-11, and 46.8% in 93-11 x TQ) of genes 
expressed in one parent and non-expressed in the other 



were categorized as monoallelic expression genes in F 2 hy- 
brids (Figure 4A class IV and Additional file 12: Table S8), 
whereas 30.2% (29.4, 27.1, and 34.0% in GL x TQ, GL x 
93-11, and 93-11 x TQ, respectively) of genes expressed at 
low levels in either parent, but enhanced in F 2 hybrids, were 
categorized as monoallelic expression genes (Figure 4A class 
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Figure 4 Complementary effects of allele-specific expression genes contributing to transcriptome optimization of F n hybrids. The left 
two lanes of each panel show the allelic expression patterns observed in the F q s, whereas the right three lanes compare the expression levels of 
these genes in the hybrids (lane 4) to those in the parents (lanes 3 and 5). The genes in groups I, II, III, and IV are those with no difference 
between the parents, a 2- to 10-fold difference between the parents, a greater than 10-fold difference between the parents, and expression in 
only one parent, respectively. (A) The expression levels of monoallelic expression genes in the Ft hybrids and their parents. (B) The expression 
levels of preferential allelic expression genes in the F] hybrids and their parents. (C) The expression levels of biallelic expression genes in the F] 
hybrids and their parents. The short bands on the same horizontal line indicate the same gene. The expression level of each gene was normalized 
by log 10 . The vertical bars on the right correlate color in the panels with relative levels of transcription. 
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III and Additional file 12: Table S8). Most of both types of 
monoallelic expression genes exhibited a mid-parent expres- 
sion level (Additional file 12: Table S8). Therefore, the alleles 
only expressed in the F 2 were those expressed in the inbred 
parent, while the alleles silenced in the inbred parent were 
also silenced in This means that the total expression level 
of the gene in the F 2 is one half that in the parent, which is 
equivalent to a mid-parent expression level (Figure 4A class 

IV and Additional file 12: Table S8). This dosage effect im- 
plies that a cis-regulatory mechanism is acting. 

Moreover, we found that more monoallelic expres- 
sion genes were downregulated than upregulated in F 2 
(Additional file 13: Table S9), and that 9.7% of prefer- 
ential allelic expression genes exhibited the same pat- 
terns as monoallelic expression genes in the F 2 hybrids 
(Figure 4B classes III and IV). The analysis of genes 
within the categories of activated and enhanced ex- 
pression in F x hybrids found complementary effects of 
superior alleles from both parents with average values 
of 73.8% and 93.6% for class III and class IV genes, re- 
spectively. By contrast, a biallelic expression pattern 
was exhibited by only 26.2% and 6.4% in class III and 
class IV genes, respectively (Figure 4C and Additional 
file 14: Table S10). The proportion of genes with comple- 
mentary effects in GL x TQ and GL x 93-11 was higher 
than that in 93-11 x TQ (7.8% and 7.5% versus 4.7%, re- 
spectively), which was consistent with the heterosis level 
of these F 2 hybrids (Additional file 14: Table S10). Our 
data imply that ASE genes, most notably monoallelic ex- 
pression genes, are the main contributors to allelic com- 
plementary effects in hybrid rice. 

ASE genes have diverse biological functions in hybrids 

To ascertain the molecular and biological functions of 
monoallelic expression and preferential allelic expression 
genes, we performed gene ontology (GO) analysis. Four 
GO terms for molecular functions, nucleotide binding, 
receptor activity, protein binding, and kinase activity, 
were commonly found among both monoallelic expres- 
sion and preferential allelic expression genes (Table 1 
and Additional file 15: Tables Sll and Additional file 16: 
Table SI 2). These functions indicate that monoallelic ex- 
pression genes are mainly involved in protein modifica- 
tion, signal transduction, and response to endogenous 
stimulus pathways. A wider diversity of functions, in- 
cluding biosynthesis, morphogenesis, and carbohydrate 
metabolism, was found for preferential allelic expression 
genes (Table 1 and Additional file 17: Tables S13 and 
Additional filel8: Table S14), suggesting that ASE genes 
have important roles in biosynthesis, development, and 
their regulation. To distinguish between the enrichment 
of monoallelic expression and preferential allelic expres- 
sion gene functions and that of all genes for ASE ana- 
lysis, we performed GO analysis of 3,627-3,824 genes in 



three F 2 hybrids. Limited GO terms were common to 
monoallelic expression and preferential allelic expression 
genes in different F 2 hybrids (Additional files 19: Figures 
S5 and Additional file 20: Figure S6), which might be a 
consequence of specific monoallelic expression and pref- 
erential allelic expression genes occurring in different F x 
hybrids, suggesting that allele-specific expression genes 
have different roles in different genomic backgrounds. 

Discussion 

Previous studies of transcriptome analyses related to het- 
erosis have mainly been conducted using microarray 
analyses, and RT-PCR and RNA-sequencing technology 
[15,31,33]. Here, we combined RNA-sequencing with 
DNA re-sequencing technology to establish ASE assays 
and achieved a 20-fold coverage of rice genome re- 
sequencing data to identify SNPs. A strict statistical cut- 
off for SNP calling enabled fully quantitative analyses of 
both overall and allele-specific gene expression profiles 
to be obtained for rice leaves at the stage of secondary 
branch differentiation. These data were verified by PCR- 
sequencing and RT-PCR sequencing using analogous 
materials planted in the following year. This developing 
stage is important as grain yield is directly correlated 
with the biomass established early in vegetative growth 
[34]. SNP accuracies of 98.0% and 91.8% were con- 
firmed, indicating the reliability of both genome and 
transcriptome data, respectively. 

Monoallelic expression genes have been studied for 
nearly a half century in humans and other mammals 
[17-20,35,36]. Most cause X-chromosome silencing, 
autosomal imprinting, or random events [17-20,35,36], 
and contribute to dose-dependent gene expression, im- 
mune responses, and disease biogenesis, including sev- 
eral types of cancers [17-20,35-37]. ASE and regulation 
mechanisms have also been studied in humans and other 
animals [23,38,39]; however, by contrast, monoallelic ex- 
pression is poorly understood in higher plants. Different 
studies have reported a high proportion of ASE genes in 
maize hybrids (50% and 60%) with cis-regulatory effects 
underlying the ASE [24,40], compared with a more lim- 
ited number of monoallelic expression genes [25]. Other 
studies using F 2 hybrids of japonica-indica, maize, and 
Arabidopsis focused on endosperm-localized genes and 
identified more than 100 imprinted genes [41-44]. How- 
ever, no large-scale ASE analysis has previously been 
carried out in rice hybrids. 

We identified 413 monoallelic expression and 2,659 
preferential allelic expression genes in the three F 2 hy- 
brids via a global transcriptome analysis. Of the total 
genes analyzed, 3.4-3.9% exhibited monoallelic expres- 
sion and 23.5-24.2% exhibited preferential allelic expres- 
sion patterns. The proportion of ASE genes, moreover, 
did not differ significantly in the three F x hybrids, which 
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Table 1 The common functions of monoallelic expression and preferential allelic expression genes in the three rice 
hybrids 





GO Term 
Molecular function 


GLxTQ 
Genes P value 


GLx 93-11 
Genes P value 


93-11 xTQ 
Genes P value 


Monoallelic expression genes 


Nucleotide binding 


26 


1 .87E-08 


27 


2.20E-10 


37 


1.52E-14 




Receptor activity 


8 


3.77E-05 


12 


1.23E-09 


7 


5.85E-04 




Protein binding 


25 


5.51E-10 


41 


1 .70E-26 


46 


1 .98E-26 




Kinase activity 


22 


5.65E-09 


22 


4.85E-10 


28 


3.30E-12 


Preferential allelic expression genes 


Nucleotide binding 


252 


4.38E-48 


300 


1 .27E-66 


220 


3.26E-30 




Receptor activity 


59 


3.55E-18 


77 


2.05E-28 


25 


0.02 




Protein binding 


231 


1 .09E-55 


276 


4.19E-76 


185 


3.07E-29 




Kinase activity 


180 


7.10E-37 


235 


1 .27E-62 


142 


8.13E-18 




Transcription factor activity 


163 


1 .20E-27 


127 


3.67E-1 1 


158 


2.33E-23 




Structural molecule activity 


63 


4.19E-11 


89 


2.71 E-22 


59 


1 .02E-08 




Transporter activity 


112 


2.22E-14 


105 


5.37E-10 


119 


1.25E-15 




Carbohydrate binding 


7 


0.03 


9 


0 


8 


0.01 




Biological function 














Monoallelic expression genes 


Protein modification 


27 


3.10E-11 


27 


1.35E-12 


36 


9.22E-17 




Signal transduction 


31 


3.16E-13 


27 


1.48E-11 


43 


2.53E-21 




Response to endogenous stimulus 


28 


8.58E-10 


24 


2.49E-08 


40 


6.32E-17 


Preferential allelic expression genes 


Protein modification 


221 


2.89E-48 


235 


4.77E-50 


175 


8.30E-24 




Signal transduction 


238 


6.76E-50 


266 


1 .24E-58 


182 


1.07E-21 




Biosynthesis 


125 


3.00E-10 


147 


1.38E-14 


90 


0.023505 




Morphogenesis 


32 


2.70E-10 


39 


7.50E-14 


23 


4.21E-05 




Response to endogenous stimulus 


264 


1.63E-54 


265 


1.75E-48 


207 


3.00E-25 




DNA metabolism 


49 


4.49E-05 


42 


0.010399 


46 


7.96E-04 




Protein biosynthesis 


40 


2.48E-06 


51 


5.48E-10 


45 


8.44E-08 




Amino acid and derivative metabolism 


86 


6.11E-15 


79 


2.05E-10 


61 


3.23E-05 




Lipid metabolism 


67 


7.99E-1 1 


63 


5.17E-08 


45 


0.003092 




Response to stress 


172 


2.38E-24 


165 


2.76E-18 


204 


5.86E-36 




Catabolism 


76 


1.96E-19 


43 


3.72E-04 


58 


1 .06E-09 




Response to external stimulus 


58 


5.92E-1 1 


56 


6.79E-09 


62 


5.63E-12 




Response to biotic stimulus 


107 


1.36E-13 


143 


4.41 E-26 


149 


2.22E-30 




Response to abiotic stimulus 


141 


9.70E-23 


140 


2.49E-19 


146 


4.98E-23 




Pollination 


10 


0.001166 


20 


2.35E-10 


11 


4.48E-04 




Flower development 


30 


3.36E-06 


44 


1.54E-12 


42 


8.04E-12 




Cell organization and biogenesis 


39 


1.81E-04 


33 


0.019878 


31 


0.033566 




Cell differentiation 


44 


2.43E-1 1 


51 


4.52E-14 


36 


6.69E-07 




Secretory pathway 


81 


0.003356 


93 


2.74E-04 


90 


3.59E-04 



is similar to a previous report in maize hybrids [24]. Im- 
portantly, our results indicated that all monoallelic ex- 
pression and preferential allelic expression genes tested 
exhibited genotype-dependent expression patterns in re- 
ciprocal crosses, which contrasts with the findings of 
monoallelic expression genes in humans and other 
mammals [21,45]. The observed genotype-dependent 



ASE in the vegetative tissue of hybrid rice could repre- 
sent a common mechanism of allelic complementary ef- 
fects in hybrids, and show the importance of parental 
genotype in both crossbreeding and hybrid breeding. 
Given the number of genes with genotype-dependent 
monoallelic expression involved in a wide range of GO 
categories that play important biological functions, many 
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potential opportunities exist for them to contribute to 
DEGs and to produce the diverse phenotypes of the T x 
hybrids. 

In the present study, the genotype-dependent ASE 
genes might confer a fitness advantage to the heterozy- 
gous state relative to either of their homozygous parents. 
As most monoallelic expression genes were silenced or 
suppressed in one of the parents, we speculate that 
genotype-dependent monoallelic expression might be 
the consequence of artificial parent selection by breeders. 
To meet the breeding objectives, "superior alleles" are ac- 
cumulated in the elite parent for long-term selection 
(Figure 4A). Such alleles could maintain allelic diver- 
sity in different varieties and enlarge the allele differ- 
ence in the rice germplasm. Therefore, our findings of 
allelic complementary effects in F x hybrid rice could 
guide the selection of elite parents through comple- 
menting a variety of the superior alleles. 

Previous studies have indicated that differential gene 
expression is common between F x and parents, and that 
it is a major contributor to heterosis at the transcription 
level [14,31]. For example, hundreds of differentially 
expressed genes were detected at different developmen- 
tal stages between the elite hybrid rice LYP9 and its par- 
ents [14], and analogous results were obtained from 
studies with Arabidopsis and maize [28,32,46,47]. Our 
study also found that the percentage of DEGs between 
Fi hybrids and parents exactly correlated with the heter- 
osis level of each F 2 hybrid (Figure 3 A and B). 

Such results do not divulge, nonetheless, how the al- 
leles from inbred lines become the DEGs in heterozy- 
gotes, although the mechanism behind this is revealed to 
some extent by genome-wide, ASE analysis using high- 
throughput RNA sequencing. Moreover, in the present 
study, the most significant DEGs between the F 2 and 
parent were expressed in the ¥ 1 but silenced in one of 
the parents, accounting for 93.7% of the DEGs associ- 
ated with ASE. Of these, 15.8% showed preferential al- 
lelic expression and 77.8% were monoallelic expression. 
Altogether, 27.5% of the ASE genes contributed to 42.4% 
of the total number of DEGs. Although the proportion 
of ASE genes was similar in the three F 2 hybrids, the 
total numbers of DEGs differed, with fewer detected in 
93-11 x TQ (Figure 3B) with its lower heterosis level 
(Figure 3 A) and more found in GL x TQ and GL x 93-11 
with their higher heterosis levels. The same results were 
also found between their parents, suggesting that transcrip- 
tome divergence in F 2 hybrids is attributed to transcrip- 
tome divergence between parents. Crucially, increased 
parent allelic variation could be an important strategy for 
maintaining higher heterosis levels in hybrid rice breeding 
programs through enlarging the transcriptome diversity 
between parents, which have accumulated different sets of 
superior alleles. 



A recent study has provided molecular evidence for a 
single gene model to support the "overdominance" hy- 
pothesis of heterosis in tomato hybrids [48]. Because of 
technological limitations, however, the maternal and pa- 
ternal alleles in the F x hybrids could not be distinguished 
effectively. The recent development of high-throughput 
sequencing technology provides the opportunity to study 
ASE in heterozygotes. Many genes have exhibited addi- 
tive expression patterns in F 2 hybrids in previous studies, 
and complementary effects at the gene expression level 
have been reported in hybrid maize [16]. Our data de- 
rived from global allelic expression profiles extend this 
result to hybrid rice, and also reveal the mechanism of 
DEG formation in heterozygotes. Our observed high 
correlation (>0.7) between allelic expression in ¥ 1 hy- 
brids and gene expression in parental lines indicates that 
a cis -regulated mechanism plays an essential role in al- 
lelic expression. Allelic complementation effects, more- 
over, can be the outcome of a cis-regulatory mechanism 
mainly contributed to by ASE genes. 

The findings of the present study support the "domin- 
ance" hypothesis for indica hybrid rice varieties and reveal 
that the consequences of complementation, primarily by 
genotype-dependent monoallelic expression and preferen- 
tial allelic expression alleles, are an accumulation of "su- 
perior" alleles that confer monoallelic expression and 
preferential allelic expression genes in F 2 hybrids. The ex- 
pression of these "superior" alleles offers an opportunity 
to optimize the transcriptomes that give rise to heterosis 
in ¥ 1 hybrids. Although our results revealed that allelic 
complementary effects played a major contribution to 
gene expression in hybrid rice and support the "domin- 
ance" hypothesis, they do not exclude the contribution of 
different mechanisms to heterosis in hybrid rice and other 
crops. 

Conclusions 

Allelic expression profiles in hybrid rice determined by 
RNA-sequencing technology demonstrated a type of 
genotype-dependent monoallelic expression genes in 
plants. DEGs between parents and F 2 hybrids were 
mainly attributable to ASE genes, which gave rise to the 
observed allelic complementary effects in F 2 hybrids. 

Methods 

Plant material and phenotypic analysis 

Reciprocal crosses were made between the three indica 
varieties, Guangluai #4 (GL), 93-11, and Teqing (TQ), 
at Hainan Island, China in the spring of 2009. The six 
reciprocal F x hybrids were planted together with the 
three parents in Wuhan, China in the summer of 2009. 
Triplicate plots, each containing 30 individuals, were 
planted for all nine genotypes. Heterosis levels were 
evaluated by measuring the fresh and dry biomass of the 
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above-ground parts of the plants at the secondary 
branch differentiation stage, which is critical for deter- 
mining the highest biomass and maximum grain yield. 
The mid-parent heterosis level was calculated as de- 
scribed by Ma et al [49]. Analog ous leaves from the 
hybrids and parents at the same development stages in 
2010 were used to verify gene and allelic expression and 
genotype-dependent monoallelic expression. 

Nuclear RNA extraction 

The second fully expanded leaves were harvested at the 
secondary branch differentiation stage, immediately fro- 
zen in liquid nitrogen and stored at -80°C. Material 
from the triplicate plots was pooled at harvest. Nuclei 
were isolated from -10 g of frozen leaves using the Plant 
Nuclei Isolation/Extraction Kit (Sigma, St. Louis, MO, 
USA). Total hnRNA was extracted from nuclei using 
Trizol (Invitrogen, Carlsbad, CA, USA) according to the 
manufacturers instructions, and treated with RNase-free 
DNase I (New England Biolabs, Ipswich, MA, USA) to 
remove any contaminating genomic DNA. 

Library construction 

The Illumina mRNA-Seq Sample Prep Kit (Illumina, San 
Diego, CA USA) was used to prepare the sequencing li- 
brary with 3 \ig of nuclear RNA. Fragmentation buffer in 
the kit was added directly to hnRNA to produce short 
fragments of 200-700 bp, which served as the templates 
for first-strand cDNA synthesis using random hexamers. 
Second-strand cDNA was synthesized followed the proto- 
col described in the kit and was purified using a QIAquick 
PCR Extraction Kit (Qiagen, Valencia, C A USA) and eluted 
in elution buffer (EB). The short fragments were then li- 
gated to sequencing adapters. Suitable fragments of ap- 
proximately 200 bp were selected as templates for 
amplification in a MyCycler PCR instrument (Bio-Rad, 
Hercules, CA USA) with the following program: de- 
naturation at 98°C for 30 s followed by 15 cycles of 98°C 
for 10 s, 65°C for 30 s, and 72°C for 30 s plus a terminal 
hold at 72°C for 5 min. The samples were then purified 
using the QIAquick PCR Purification Kit according to the 
manufacturers protocol and eluted in 30 \iL of EB. A l-|iL 
aliquot of the construct was loaded onto an Agilent Tech- 
nologies 2100 Bioanalyzer using the Agilent DNA 1000 
Chip Kit (Agilent, Santa Clara, CA USA). After verifying 
the size and purity of the DNA fragments, the library was 
sequenced using an Illumina GA II x platform by BGI 
(Shenzhen, China). 

The DNA Illumina TruSeq DNA sample preparation 
kit was used to prepare a genomic DNA library accord- 
ing to the manufacturer s protocol. Contaminating RNA 
was removed by treating with RNase A. The DNA sam- 
ples were sent to the Beijing Novo Gene Company 



(Beijing, China) for sequencing using an Illumina Hiseq 
2000 platform. 

Read alignment 

The raw reads were filtered before data analysis by re- 
moving reads consisting of adaptors only, those with 
greater than 10% unknown bases, and those in which 
more than half of the bases gave a quality score of less 
than 5.0. The remaining (clean) reads were mapped to 
the reference genome of Japonica variety Nipponbare 
(http://www.gramene.org/) using SOAP2 software [50]. 
Mismatches of no more than two bases were allowed in 
the alignment. We used the reads per kb per million 
reads (RPKM) method to calculate unique gene expres- 
sion levels [51]. 

SNP identification 

SOAP2 was used to align each read to the Nipponbare 
reference genome (http://www.gramene.org/), with no 
more than three mismatches to the candidate SNPs per- 
mitted for each read [50]. This method of SNP calling 
has been previously described [46]. A statistical model 
based on Bayesian theory and the Illumina quality sys- 
tem was used to calculate the probability of each pos- 
sible genotype at every position from the alignment of 
reads to the reference genome. Six criteria were set to 
filter out unreliable SNPs: 1) the read quality value must 
be no lower than 20, 2) the SNPs must be at least 5 bp 
from each other, 3) the SNP must be represented by at 
least four reads, 4) the sequencing depth must be less 
than 10,000, 5) the SNP must be more than 5 bp distant 
from an intron-exon junction and 6) the approximate 
copy number of the flanking sequences must be less 
than four. SNP sets from each biological replicate of GL 
versus TQ, GL versus 93-11, and 93-11 versus TQ were 
obtained and used for further allele-specific analysis. 

ASE analysis 

The paternal and maternal alleles expressed in the F x hy- 
brid transcriptomes were distinguished by their SNP 
nucleotype. The expression level was calculated based 
on at least 10 reads of single genes. The expression level 
from a paternal or maternal allele was calculated based 
on the number of reads for the given allele divided by 
the total number of reads for the SNP. When only one 
allele was expressed, the gene was categorized as show- 
ing monoallelic expression. When the allele expression 
level was biased toward one parent by more than two- 
fold, the gene was categorized as showing preferential al- 
lelic expression. When the two alleles were expressed 
equally (less than two-fold difference), the gene was cat- 
egorized as showing biallelic expression. In our com- 
puter analysis, the relative allele expression value of "0", 
which occurred when an allele was not expressed, could 
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not be computed. Therefore, to enable calculations, the 
"0" was replaced by 0.001. 

SNP confirmation and validation of allelic expression 

For SNP validation, primers flanking the SNP sites were 
designed to amplify 300-800 bp fragments. PCR amplifi- 
cation was performed using reaction mixtures contain- 
ing 10 ng of cDNA template and 5 [iM primer. PCR was 
performed using the My Cycler PCR system with the fol- 
lowing parameters: denaturation at 95°C for 5 min 
followed by 30 cycles of 95°C for 30 s, 50-55°C for 30 s, 
and 72°C for 30 s plus a terminal hold at 72°C for 5 min. 
PCR products were purified using the Axy PCR Cleanup 
Kit (AxyGEN Bioscience, Union city, CA USA) and se- 
quenced using an ABI 3730x1 machine. The resulting se- 
quences were compared with reference SNPs detected 
by genome sequencing. 

For monoallelic expression and preferential allelic ex- 
pression validation, cDNA and gDNA from each Fi sam- 
ple were used as PCR templates, and the sequences 
derived from the genome and transcriptome were com- 
pared. For preferential allelic expression and biallelic ex- 
pression validation, cDNA from reciprocal F 2 hybrids 
was used as PCR templates. The sequencing results for 
cDNA from reciprocal F 2 hybrids were compared to de- 
tect parent-of-origin effects. 

GO and statistical analyses 

GO analysis was performed using the open-source 
MAS3 database (http://bioinfo.capitalbio.com/mas3/). 
A threshold of a two-fold change in gene expression 
levels and a FDR of <0.05 were used to identify differen- 
tially expressed genes. The P-values and the FDRs of differ- 
entially expressed genes were calculated as described [52]. 

Gene expression level validation 

To validate the gene expression level, 30 randomly se- 
lected genes with different expression levels were verified 
by quantitative RT-PCR as described by Wang et al. [53]. 
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